{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Dataset** is a abstraction of local file system.\n",
    "Users can add their local paths into this system to easily access the data inside.\n",
    "The basic concept is to treat a data file as a property of a ``Dataset`` object.\n",
    "When users call these properties, ``Dataset`` will load the data files automatically.\n",
    "\n",
    "The following tutorial shows how easy it is to interactive with data in this system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Dataset> includes:\n",
       "\"data1\": /Users/liuchang/projects/xenonpy/samples/set1/data1.pd.xz\n",
       "\"data2\": /Users/liuchang/projects/xenonpy/samples/set2/data2.pd.xz"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from xenonpy.datatools import Dataset\n",
    "\n",
    "# use dir path as parameters when initlization\n",
    "ds = Dataset('set1', 'set2')\n",
    "ds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0  1\n",
       "0  1  2\n",
       "1  3  4"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# load data\n",
    "\n",
    "ds.data1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Dataset> includes:\n",
       "\"data1\": /Users/liuchang/projects/xenonpy/samples/set1/data1.csv\n",
       "\"data2\": /Users/liuchang/projects/xenonpy/samples/set2/data2.csv"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# change backend\n",
    "\n",
    "ds_csv = ds.csv\n",
    "ds_csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  0  1\n",
       "0           0  1  2\n",
       "1           1  3  4"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_csv.data2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Dataset> includes:\n",
       "\"data1\": /Users/liuchang/projects/xenonpy/samples/set1/data1.pkl.z\n",
       "\"data2\": /Users/liuchang/projects/xenonpy/samples/set2/data2.pkl.z"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set backend at init\n",
    "\n",
    "ds = Dataset('set1', 'set2', backend='pickle')\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Preset\n",
    "\n",
    "Currently, two sets of element-level property data are available (``elements`` and ``elements_completed`` (imputed version of ``elements``))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atomic_number</th>\n",
       "      <th>atomic_radius</th>\n",
       "      <th>atomic_radius_rahm</th>\n",
       "      <th>atomic_volume</th>\n",
       "      <th>atomic_weight</th>\n",
       "      <th>boiling_point</th>\n",
       "      <th>brinell_hardness</th>\n",
       "      <th>bulk_modulus</th>\n",
       "      <th>c6</th>\n",
       "      <th>c6_gb</th>\n",
       "      <th>...</th>\n",
       "      <th>vdw_radius_bondi</th>\n",
       "      <th>vdw_radius_dreiding</th>\n",
       "      <th>vdw_radius_mm3</th>\n",
       "      <th>vdw_radius_rt</th>\n",
       "      <th>vdw_radius_truhlar</th>\n",
       "      <th>vdw_radius_uff</th>\n",
       "      <th>sound_velocity</th>\n",
       "      <th>vickers_hardness</th>\n",
       "      <th>Polarizability</th>\n",
       "      <th>youngs_modulus</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>H</th>\n",
       "      <td>1</td>\n",
       "      <td>79.0</td>\n",
       "      <td>154.0</td>\n",
       "      <td>14.1</td>\n",
       "      <td>1.008000</td>\n",
       "      <td>20.280</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.499027</td>\n",
       "      <td>6.51</td>\n",
       "      <td>...</td>\n",
       "      <td>120.0</td>\n",
       "      <td>319.5</td>\n",
       "      <td>162.0</td>\n",
       "      <td>110.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>288.6</td>\n",
       "      <td>1270.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.666793</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>He</th>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>134.0</td>\n",
       "      <td>31.8</td>\n",
       "      <td>4.002602</td>\n",
       "      <td>4.216</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.420000</td>\n",
       "      <td>1.47</td>\n",
       "      <td>...</td>\n",
       "      <td>140.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>153.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>236.2</td>\n",
       "      <td>970.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.205052</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Li</th>\n",
       "      <td>3</td>\n",
       "      <td>155.0</td>\n",
       "      <td>220.0</td>\n",
       "      <td>13.1</td>\n",
       "      <td>6.940000</td>\n",
       "      <td>1118.150</td>\n",
       "      <td>NaN</td>\n",
       "      <td>11.0</td>\n",
       "      <td>1392.000000</td>\n",
       "      <td>1410.00</td>\n",
       "      <td>...</td>\n",
       "      <td>181.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>255.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>245.1</td>\n",
       "      <td>6000.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>24.330000</td>\n",
       "      <td>4.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Be</th>\n",
       "      <td>4</td>\n",
       "      <td>112.0</td>\n",
       "      <td>219.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>9.012183</td>\n",
       "      <td>3243.000</td>\n",
       "      <td>600.0</td>\n",
       "      <td>130.0</td>\n",
       "      <td>227.000000</td>\n",
       "      <td>214.00</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>223.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>153.0</td>\n",
       "      <td>274.5</td>\n",
       "      <td>13000.0</td>\n",
       "      <td>1670.0</td>\n",
       "      <td>5.600000</td>\n",
       "      <td>287.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>5</td>\n",
       "      <td>98.0</td>\n",
       "      <td>205.0</td>\n",
       "      <td>4.6</td>\n",
       "      <td>10.810000</td>\n",
       "      <td>3931.000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>320.0</td>\n",
       "      <td>99.500000</td>\n",
       "      <td>99.20</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>402.0</td>\n",
       "      <td>215.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>192.0</td>\n",
       "      <td>408.3</td>\n",
       "      <td>16200.0</td>\n",
       "      <td>49000.0</td>\n",
       "      <td>3.030000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 74 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    atomic_number  atomic_radius  atomic_radius_rahm  atomic_volume  \\\n",
       "H               1           79.0               154.0           14.1   \n",
       "He              2            NaN               134.0           31.8   \n",
       "Li              3          155.0               220.0           13.1   \n",
       "Be              4          112.0               219.0            5.0   \n",
       "B               5           98.0               205.0            4.6   \n",
       "\n",
       "    atomic_weight  boiling_point  brinell_hardness  bulk_modulus           c6  \\\n",
       "H        1.008000         20.280               NaN           NaN     6.499027   \n",
       "He       4.002602          4.216               NaN           NaN     1.420000   \n",
       "Li       6.940000       1118.150               NaN          11.0  1392.000000   \n",
       "Be       9.012183       3243.000             600.0         130.0   227.000000   \n",
       "B       10.810000       3931.000               NaN         320.0    99.500000   \n",
       "\n",
       "      c6_gb  ...  vdw_radius_bondi  vdw_radius_dreiding  vdw_radius_mm3  \\\n",
       "H      6.51  ...             120.0                319.5           162.0   \n",
       "He     1.47  ...             140.0                  NaN           153.0   \n",
       "Li  1410.00  ...             181.0                  NaN           255.0   \n",
       "Be   214.00  ...               NaN                  NaN           223.0   \n",
       "B     99.20  ...               NaN                402.0           215.0   \n",
       "\n",
       "    vdw_radius_rt  vdw_radius_truhlar  vdw_radius_uff  sound_velocity  \\\n",
       "H           110.0                 NaN           288.6          1270.0   \n",
       "He            NaN                 NaN           236.2           970.0   \n",
       "Li            NaN                 NaN           245.1          6000.0   \n",
       "Be            NaN               153.0           274.5         13000.0   \n",
       "B             NaN               192.0           408.3         16200.0   \n",
       "\n",
       "    vickers_hardness  Polarizability  youngs_modulus  \n",
       "H                NaN        0.666793             NaN  \n",
       "He               NaN        0.205052             NaN  \n",
       "Li               NaN       24.330000             4.9  \n",
       "Be            1670.0        5.600000           287.0  \n",
       "B            49000.0        3.030000             NaN  \n",
       "\n",
       "[5 rows x 74 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from xenonpy.datatools import preset\n",
    "\n",
    "preset.elements.head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>atomic_number</th>\n",
       "      <th>atomic_radius</th>\n",
       "      <th>atomic_radius_rahm</th>\n",
       "      <th>atomic_volume</th>\n",
       "      <th>atomic_weight</th>\n",
       "      <th>boiling_point</th>\n",
       "      <th>bulk_modulus</th>\n",
       "      <th>c6_gb</th>\n",
       "      <th>covalent_radius_cordero</th>\n",
       "      <th>covalent_radius_pyykko</th>\n",
       "      <th>...</th>\n",
       "      <th>num_s_valence</th>\n",
       "      <th>period</th>\n",
       "      <th>specific_heat</th>\n",
       "      <th>thermal_conductivity</th>\n",
       "      <th>vdw_radius</th>\n",
       "      <th>vdw_radius_alvarez</th>\n",
       "      <th>vdw_radius_mm3</th>\n",
       "      <th>vdw_radius_uff</th>\n",
       "      <th>sound_velocity</th>\n",
       "      <th>Polarizability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>H</th>\n",
       "      <td>1.0</td>\n",
       "      <td>79.000000</td>\n",
       "      <td>154.0</td>\n",
       "      <td>14.1</td>\n",
       "      <td>1.008000</td>\n",
       "      <td>20.280</td>\n",
       "      <td>56.79964</td>\n",
       "      <td>6.51</td>\n",
       "      <td>31.0</td>\n",
       "      <td>32.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.122728</td>\n",
       "      <td>0.1805</td>\n",
       "      <td>110.0</td>\n",
       "      <td>120.0</td>\n",
       "      <td>162.0</td>\n",
       "      <td>288.6</td>\n",
       "      <td>1270.0</td>\n",
       "      <td>0.666793</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>He</th>\n",
       "      <td>2.0</td>\n",
       "      <td>147.832643</td>\n",
       "      <td>134.0</td>\n",
       "      <td>31.8</td>\n",
       "      <td>4.002602</td>\n",
       "      <td>4.216</td>\n",
       "      <td>85.10663</td>\n",
       "      <td>1.47</td>\n",
       "      <td>28.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>...</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>5.188000</td>\n",
       "      <td>0.1513</td>\n",
       "      <td>140.0</td>\n",
       "      <td>143.0</td>\n",
       "      <td>153.0</td>\n",
       "      <td>236.2</td>\n",
       "      <td>970.0</td>\n",
       "      <td>0.205052</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Li</th>\n",
       "      <td>3.0</td>\n",
       "      <td>155.000000</td>\n",
       "      <td>220.0</td>\n",
       "      <td>13.1</td>\n",
       "      <td>6.940000</td>\n",
       "      <td>1118.150</td>\n",
       "      <td>11.00000</td>\n",
       "      <td>1410.00</td>\n",
       "      <td>128.0</td>\n",
       "      <td>133.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.489000</td>\n",
       "      <td>85.0000</td>\n",
       "      <td>182.0</td>\n",
       "      <td>212.0</td>\n",
       "      <td>255.0</td>\n",
       "      <td>245.1</td>\n",
       "      <td>6000.0</td>\n",
       "      <td>24.330000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Be</th>\n",
       "      <td>4.0</td>\n",
       "      <td>112.000000</td>\n",
       "      <td>219.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>9.012183</td>\n",
       "      <td>3243.000</td>\n",
       "      <td>130.00000</td>\n",
       "      <td>214.00</td>\n",
       "      <td>96.0</td>\n",
       "      <td>102.0</td>\n",
       "      <td>...</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.824000</td>\n",
       "      <td>190.0000</td>\n",
       "      <td>153.0</td>\n",
       "      <td>198.0</td>\n",
       "      <td>223.0</td>\n",
       "      <td>274.5</td>\n",
       "      <td>13000.0</td>\n",
       "      <td>5.600000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>5.0</td>\n",
       "      <td>98.000000</td>\n",
       "      <td>205.0</td>\n",
       "      <td>4.6</td>\n",
       "      <td>10.810000</td>\n",
       "      <td>3931.000</td>\n",
       "      <td>320.00000</td>\n",
       "      <td>99.20</td>\n",
       "      <td>84.0</td>\n",
       "      <td>85.0</td>\n",
       "      <td>...</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.025000</td>\n",
       "      <td>27.0000</td>\n",
       "      <td>192.0</td>\n",
       "      <td>191.0</td>\n",
       "      <td>215.0</td>\n",
       "      <td>408.3</td>\n",
       "      <td>16200.0</td>\n",
       "      <td>3.030000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 58 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    atomic_number  atomic_radius  atomic_radius_rahm  atomic_volume  \\\n",
       "H             1.0      79.000000               154.0           14.1   \n",
       "He            2.0     147.832643               134.0           31.8   \n",
       "Li            3.0     155.000000               220.0           13.1   \n",
       "Be            4.0     112.000000               219.0            5.0   \n",
       "B             5.0      98.000000               205.0            4.6   \n",
       "\n",
       "    atomic_weight  boiling_point  bulk_modulus    c6_gb  \\\n",
       "H        1.008000         20.280      56.79964     6.51   \n",
       "He       4.002602          4.216      85.10663     1.47   \n",
       "Li       6.940000       1118.150      11.00000  1410.00   \n",
       "Be       9.012183       3243.000     130.00000   214.00   \n",
       "B       10.810000       3931.000     320.00000    99.20   \n",
       "\n",
       "    covalent_radius_cordero  covalent_radius_pyykko  ...  num_s_valence  \\\n",
       "H                      31.0                    32.0  ...            1.0   \n",
       "He                     28.0                    46.0  ...            2.0   \n",
       "Li                    128.0                   133.0  ...            1.0   \n",
       "Be                     96.0                   102.0  ...            2.0   \n",
       "B                      84.0                    85.0  ...            2.0   \n",
       "\n",
       "    period  specific_heat  thermal_conductivity  vdw_radius  \\\n",
       "H      1.0       1.122728                0.1805       110.0   \n",
       "He     1.0       5.188000                0.1513       140.0   \n",
       "Li     2.0       3.489000               85.0000       182.0   \n",
       "Be     2.0       1.824000              190.0000       153.0   \n",
       "B      2.0       1.025000               27.0000       192.0   \n",
       "\n",
       "    vdw_radius_alvarez  vdw_radius_mm3  vdw_radius_uff  sound_velocity  \\\n",
       "H                120.0           162.0           288.6          1270.0   \n",
       "He               143.0           153.0           236.2           970.0   \n",
       "Li               212.0           255.0           245.1          6000.0   \n",
       "Be               198.0           223.0           274.5         13000.0   \n",
       "B                191.0           215.0           408.3         16200.0   \n",
       "\n",
       "    Polarizability  \n",
       "H         0.666793  \n",
       "He        0.205052  \n",
       "Li       24.330000  \n",
       "Be        5.600000  \n",
       "B         3.030000  \n",
       "\n",
       "[5 rows x 58 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preset.elements_completed.head(5)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
